Many models are too complex for quadratic approximation or other short cuts.
We need a general method of numeric integration to explore the posterior effectively.
MCMC \(\lim_{t\to\infty}\) posterior
Monte Carlo: “repeated random sampling to obtain numerical results” -Wikipedia
Markov chain: stochastic model in which state at time \(t\) depends only on previous state at \(t-1\)
Imagine this one-dimensional posterior
Imagine this one-dimensional posterior:
Now imagine that we cannot see it!
We can only calculate the heights given particular guesses of \(\theta\)
Imagine we take a guess for \(\theta\) This is the first step of our (Markov) chain
We can then find the height for this guess
height = likelihood of data given \(\theta\) \(\times\) prior of \(\theta\)
= \(\text{Pr}(x|\theta) \times \text{Pr}(\theta)\).
Choose a proposal for a new value on the chain. - Our proposal distribution (blue) is centered on our last guess (gray). - Can be any symmetric shape.
Find the height (posterior-ish value) for our proposal
Calculate the ratio of those heights: \[ r = \frac{\text{Pr}(x|\theta_p)\text{Pr}(\theta_p)}{\text{Pr}(x|\theta_{t})\text{Pr}(\theta)} = \frac{0.078}{0.07} = 1.124 \]
Choose a random uniform variable, \(U \sim \text{Uniform}(0,1)\)
If \(r > U\), accept the proposal & add it to the chain. If \(r < U\), reject it, add current value to chain, & try a new proposal.
We want to calculate the ratio of posterior probabilities:
\[ r = \frac{\text{Pr}(\theta_p | x)}{\text{Pr}(\theta_{t} | x)} = \frac{\frac{\text{Pr}(x|\theta_p)\text{Pr}(\theta_p)}{\text{Pr}(x)}}{\frac{\text{Pr}(x|\theta_{t})\text{Pr}(\theta_{t})}{\text{Pr}(x)}} \] but we do not know (and cannot calculate) \(\text{Pr}(x)\).
However, they cancel out! We just need the numerators! \[ r = \frac{\text{Pr}(\theta_p | x)}{\text{Pr}(\theta_{t} | x)} = \frac{\text{Pr}(x|\theta_p)\text{Pr}(\theta_p)}{\text{Pr}(x|\theta_t)\text{Pr}(\theta_t)} \]
\[ r = \frac{0.061}{0.078} = 0.781 ; U = 0.89 \]
Since \(r < U\), reject proposal, add current value to chain, & try a new proposal.
\[ r = \frac{0.079}{0.078} = 1.004 \] \(U=0.212\)
Since \(r > U\), accept proposal & add to chain
It works!
This is the Metropolis Algorithm
What if we get proposals outside of possible boundaries?
However, proposal distribution is \(a\)symmetric:
\[ \text{Pr}(\theta_{t-1} \rightarrow \theta_p) \neq \text{Pr}(\theta_{p} \rightarrow \theta_{t-1}) \]
Adjust the ratio to: \[ r = \frac{\text{Pr}(x|\theta_p)\text{Pr}(\theta_p)}{\text{Pr}(x|\theta_{t-1})\text{Pr}(\theta_{t-1})}\times \frac{\text{Pr}(\theta_{t-1}|\theta_p)}{\text{Pr}(\theta_p|\theta_{t-1})} \]
and it works!
A widespread alternative is Gibbs sampling (WinBugs, JAGS)
Like moving along posterior one dimension at a time
Suffers from correlation among variables
Imagine a proposed jump from one of these points could easily be far from posterior density
Gibbs sampling gets slowed down, too.
Hamiltonian Monte Carlo
First, take the negative log, so high probability = low areas
First, take the negative log, so high probability = low areas
But remember, we cannot see it
We can only calculate the height given a value of \(\theta\)
Take a first guess
…and give it a bit of momentum (in a direction of parameter space)
Distribution of momentum comes from standard (multivariate) normal distribution
Track it’s movement for a certain amount of “time” using Hamiltonian equations
\[ \begin{align} \text{Total Energy} &= \text{Potential} + \text{Kinetic}\\ E &= U(\theta) + K(\text{momentum}) \end{align} \]
After some pre-determined amount of time, the position of the point is the new proposal
Track it’s movement for a certain amount of “time” using Hamiltonian equations
\[ \begin{align} \text{Total Energy} &= \text{Potential} + \text{Kinetic}\\ E &= U(\theta) + K(\text{momentum}) \end{align} \]
Proposals tend to be in areas with higher probability density
No analytic solution to Hamiltonian for most problems, so:
No analytic solution to Hamiltonian for most problems, so:
Proposal acceptance actually reflects the difference in energy (i.e., how well we followed the path of the ball)
Assumes a continuous likelihood surface, so no discrete parameters allowed!
Similarly, very sharp surfaces cause problems and produce divergences
Overall, each proposal requires many more calculations (many steps, calculating kinetic energy & momentum at each), but proposals are much better / rarely rejected, so much more efficient overall